Diabetic retinopathy (DR) is a complication of diabetes, and one of the major causes of vision impairment in the global population. As the early-stage manifestation of DR is usually very mild and hard to detect, an accurate diagnosis via eye-screening is clinically important to prevent vision loss at later stages. In this work, we propose an ensemble method to automatically grade DR using ultra-wide optical coherence tomography angiography (UW-OCTA) images available from Diabetic Retinopathy Analysis Challenge (DRAC) 2022. First, we adopt the state-of-the-art classification networks, i.e., ResNet, DenseNet, EfficientNet, and VGG, and train them to grade UW-OCTA images with different splits of the available dataset. Ultimately, we obtain 25 models, of which, the top 16 models are selected and ensembled to generate the final predictions. During the training process, we also investigate the multi-task learning strategy, and add an auxiliary classification task, the Image Quality Assessment, to improve the model performance. Our final ensemble model achieved a quadratic weighted kappa (QWK) of 0.9346 and an Area Under Curve (AUC) of 0.9766 on the internal testing dataset, and the QWK of 0.839 and the AUC of 0.8978 on the DRAC challenge testing dataset.
translated by 谷歌翻译
分布式学习在医学图像分析中表现出了巨大的潜力。它允许使用具有隐私保护的多中心培训数据。但是,由于不同的成像供应商和注释协议,本地中心的数据分布可能会彼此不同。这种变化降低了基于学习的方法的性能。为了减轻影响,已经提出了两组方法针对不同的目标,即全球方法和个性化方法。前者的目的是改善来自看不见的中心(称为通用数据)的所有测试数据的单个全局模型的性能;而后者则针对每个中心的多个模型(称为本地数据)。但是,几乎没有研究以同时实现这两个目标。在这项工作中,我们提出了一个新的分布式学习框架,该框架弥合了两组之间的差距,并提高了通用和本地数据的性能。具体而言,我们的方法通过分布条件的适应矩阵将通用数据和局部数据的预测分解。多中心左心房(LA)MRI分割的结果表明,我们的方法表明,在通用和局部数据上的现有方法比现有方法表现出色。我们的代码可从https://github.com/key1589745/decouple_predict获得
translated by 谷歌翻译
最近,我们提供了Wenet,这是一种面向生产的端到端语音识别工具包,它引入了统一的两通道(U2)框架和内置运行时,以解决单个中的流和非流传输模式。模型。为了进一步提高ASR性能并促进各种生产要求,在本文中,我们提出了Wenet 2.0,并提供四个重要的更新。 (1)我们提出了U2 ++,这是一个带有双向注意解码器的统一的两次通行框架,其中包括通过左右注意力解码器的未来上下文信息,以提高共享编码器的代表性和在夺回阶段的表现。 (2)我们将基于N-Gram的语言模型和基于WFST的解码器引入WENET 2.0,从而促进了在生产方案中使用丰富的文本数据。 (3)我们设计了一个统一的上下文偏见框架,该框架利用特定于用户的上下文(例如联系人列表)为生产提供快速适应能力,并提高了使用LM和没有LM场景的ASR准确性。 (4)我们设计了一个统一的IO,以支持大规模数据进行有效的模型培训。总而言之,全新的WENET 2.0可在各种Corpora上的原始WENET上取得高达10 \%的相对识别性能提高,并提供了一些重要的以生产为导向的功能。
translated by 谷歌翻译
心肌活力的评估对于患有心肌梗塞的患者的诊断和治疗管理是必不可少的,并且心肌病理学的分类是本评估的关键。这项工作定义了医学图像分析的新任务,即进行心肌病理分割(MYOPS)结合三个序列的心脏磁共振(CMR)图像,该图像首次与Mycai 2020一起在Myops挑战中提出的。挑战提供了45个配对和预对准的CMR图像,允许算法将互补信息与三个CMR序列组合到病理分割。在本文中,我们提供了挑战的详细信息,从十五个参与者的作品调查,并根据五个方面解释他们的方法,即预处理,数据增强,学习策略,模型架构和后处理。此外,我们对不同因素的结果分析了结果,以检查关键障碍和探索解决方案的潜力,以及为未来的研究提供基准。我们得出结论,虽然报告了有前途的结果,但研究仍处于早期阶段,在成功应用于诊所之前需要更深入的探索。请注意,MyOPS数据和评估工具继续通过其主页(www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20 /)注册注册。
translated by 谷歌翻译
精确的心脏计算,多种式图像的分析和建模对于心脏病的诊断和治疗是重要的。晚期钆增强磁共振成像(LGE MRI)是一种有希望的技术,可视化和量化心肌梗塞(MI)和心房疤痕。由于LGE MRI的低图像质量和复杂的增强图案,MI和心房疤痕的自动化量可能是具有挑战性的。此外,与带金标准标签的其他序列LGE MRIS相比特别有限,这表示用于开发用于自动分割和LGE MRIS定量的新型算法的另一个障碍。本章旨在总结最先进的基于深度学习的多模态心脏图像分析的先进贡献。首先,我们向基于多序心脏MRI的心肌和病理分割介绍了两个基准工作。其次,提出了两种新的左心房瘢痕分割和从LGE MRI定量的新型框架。第三,我们为跨型心脏图像分割提出了三种无监督的域适应技术。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译